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A collection of elementary formulas for calculating the gradients of scalar- and matrix-valued 
functions of one matrix argument is presented. Using some of the well-known properties of the operator 
"trace" on square matrices, alternative definitions of gradients and simple examples of calculating 
them using the product rule and the chain rule for differentiation are treated in an expository fashion 
in both component and matrix notations with emphasis on the latter. Two examples in continuum 
mechanics are presented to illustrate the application of the so-called "matrix calculus" of difTerentiable 
functions. 

Key words: Chain rule; continuum mechanics; gradient; matrices; matrix calculus; partial differentia- 
tion; product rule; tensor function; trace. 

1. Introduction 

This is an expository article on the use of matrix notation in the elementary calculus of difTer- 
entiable functions whose arguments are square matrices. For example, in continuum physics, it 
is often necessary to work with partial derivatives of a class of functions whose arguments are 
elements of a square matrix and whose values can be either scalars or square matrices of the 
same order. Following the notation and basic concepts of tensor functions as treated by Truesdell 
and Noll [1, pp. 20-35] 1 , we present here an elementary introduction to the proper formulation 
of the chain rule and the product rule for differentiation in matrix notation and we include examples, 
formulas and applications to illustrate the two rules. 

The reader is assumed to be familiar with the notions of the trace and the determinant of a 



matrix 



A= (Aij), i, ,7=1, 2, . . ., ti, i.e., tr A=^ An and det A = 2 (— l) h Ai ai A 2 <r 2 . . . A 



nan-, 



where the last summation is made over all permutations of cri, ct 2 , . . ., o- n , and h is the number 
of interchanges required to restore the natural order. 2 In particular, the following properties of 
the operator "trace" are applied frequently throughout the paper: 

(a) tr (A+B) = trA + trB; 

(b) tr (I/?)=tr (BA); 

(c) tr (A T ) = tr A, where A T denotes the transpose of A; 

(d) A=B, if, and only if, tr (AC) = tr (BC) for arbitrary matrix C. 



AMS Subject Classification: Primary 15, 88; Secondary 69. 

1 Figures in brackets indicate the literature references at the end of this paper. 

2 A square matrix is denoted by a symbol underlined with two bars indicating the need for two indices in component notation. In general, any quantity with, say, 
k indices in component notation will be underlined with k bars when the indices are suppressed. For ease of printing, this convention is followed in equations but 
ignored in text. 
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2. Gradient of a Scalar Function of a Matrix Argument 

Let e = e(A u ,A 12 , . . .,A ln ,A 21 ,A 22 , . . .,A 2n , . . .,A nl ,A n2 , . . .,A nn ) define a scalar-valued 
function € of n 2 variables A km , k, m= 1, 2, . . ., n, such that the set of variables Akm corresponds 
to the set of components of a square matrix A of order n. In matrix notation, the definition of the 
scalar function e assumes the following simple form: 

e = e(A). (2.1) 

If € is differentiable with respect to each variable Akm, the set of first partial derivatives of 
e, i.e.,{Z>A m e, k 9 m=l 9 2, . . ., n} 9 can be defined as a matrix- valued function to be denoted by Ve 
where the element (Ve)/,-™ at the Zcth row and the mth column of Ve is given precisely by Dkmt. 
Let €a denote the value of the function Ve for a given A, then the definition of the function Ve, 
to be known as the gradient of e, can be stated in both component and matrix notations as follows: 

€a - [(d)*J ■ [D k mi(A pq )] m [V$) km (A pq )] s Ve(^). (2.2) 

For brevity, we omit the statement that all indices k,m 9 p 9 q,... 9 etc., range from 1 to n. 

For the purpose of applying those properties of the operator "trace" as listed in the last 
section, Truesdell and Noll [1] presented an alternative definition of the gradient of a scalar function 
of a matrix argument as follows: 

tr {£ A C} = ^ e(^+ sG) | s =o, (2.3a) 



or, in component notation, 



^^ (^A)kmCkm= -r€(Ap q + sCp q )\s=o, (2.3b) 



k=i m=\ " 5 



n n r 
/,— 1 m =l L 



i€ (Ap q + sC m ) C km . (2.4) 



If we substitute zero for s in (2.4) and apply (2.2), we obtain (2.3b). Conversely, (2.3b) and the 
chain rule imply (2.2). The reader may wish to verify that (2.3) indeed defines a unique matrix 
e.4 as a result of the linearity of the operator "trace" and the arbitrariness of the matrix C. 
Example 1 : e = i(A) = det A. 

To calculate the gradient of e, we apply the Laplace development of a determinant, i.e. , 

' det A = ^ AkmA km , m being fixed and not summed, (2.5) 
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where C, with components C PQ , is an arbitrary matrix of the same order as the matrix A. To see 
that (2.2) and (2.3) are equivalent, we apply the chain rule for differentiation to the expression 

€\Apq-\- sCpq) : 

d n n \ d 1 

-j-£(Ap Q + sCp q ) = y £ X [Dkmi(Ap q + sCp q )]\-j-(A km + sC k m)\ 



where A km denotes the cofactor of A km defined as (— \) k+m times the complementary minor of 
A km- Using (2.2) and the fact that the cofactor A km is independent of the element A^m of the matrix 
A, we obtain: 

e±= [(e A ) km ] = [Dtm(detA)] = [A*™] =A^ (2.6) 

where A co{ denotes the cofactor matrix of A which, by definition, equals the transpose of the adjoint 
matrix of A. Let us verify the result given in (2.6) by applying the alternative definition of e.\ as 
given in (2.3): 

tr{6 T A C} = -j-det(A + sC)\ 8 =o 

= -^det[A(l+sA-i£)]\ s =o 

= (det^)Adet (l+sA-*C) \ 8=0 . n ^ 

— as ( z -<) 

Following Truesdell and Noll [ 1], we introduce another expansion of a determinant : 

det (l + s|) = l+/,(B)s + h(B)s 2 + . . . +I n (B)s n , (2.8) 

where B is any square matrix of order n and 1\ (B) , h{B) , . . ., I n {B) are the so-called principal 
invariants of B? In our case, we are only interested in the first principal invariant I\(B) which 
equals tr (B). Combining (2.7) and (2.8), we obtain: 

tr {€$£}= ( det A)\r(A-iC) 

= Iy{{&ziA)A-'C}, 
i.e., e*=(detA)(A-i) T . (2 .9) 

Since A~ l =(det A)~ l (A. cof ) T , we see immediately that (2.9) is equivalent to (2.6), and that both 
definitions given in (2.2) and (2.3) yield the same result. 

EXAMPLE 2: e = e(A) = tr{£ u ), l m being any positive integer. 

Since we have yet to introduce the notion of the gradient of a matrix-valued function, we 
must rule out the possibility of calculating the gradient of e using the chain rule. To apply the 
definition of the gradient of e as given in (2.2), it is necessary to develop an expansion of the function 
e in terms of the components of A. We observe that for arbitrary positive integer m, the expansion 
of the matrix A m is cumbersome, and it is not practical to find the gradient of e using (2.2). 

However, the definition given in (2.3) does lead us to an answer: 

tr{€/£} = ^tr{(^ + 5 C) w },=o 

=r £ \(A m +{A n - l C + £ n - 2 CA+ . . . +C/*'»- 1 )s+ . . .)}s=o 

= tr {A m - 1 C+A m ~ 2 CA+ . . . 4-C^- 1 } 

= lr{mA m - l C\, (2.10) 



3 For a rigorous exposition of the notion of a principal invariant of a matrix or a second order tensor, see Ericksen [2, p. 832]. 
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since the trace operator is linear and tr(AB) = tr(BA) . Since C is arbitrary, we conclude that 

Example 3: e^ef^f) = det (A 2 + B),Il being a constant matrix. 

In this case, both definitions given in (2.2) and (2.3) ar^ not practical for us to evaluate the 
gradient of the scalar function e. The only reasonable alternative is to use the chain rule in conjunc- 
tion with a practical way of evaluating the gradient of a matrix- valued function as to be presented 
in the next section. 

3. Gradient of a Matrix-Valued Function of a Matrix Argument 

Let /=/ (A) define a matrix- valued function/ of a matrix argument ,4 where both /and A are 
square matrices of order n with components f km and A rs respectively, and the ri 2 component func- 
tions/ km of/ are defined as follows: 

fkm =fkm(A r8 ) =fkm(A). (3.1) 

If each component function f km is differentiable, the set of the first partial derivatives of 
fkm, i.e., {Dpqfkm}, can be defined as the gradient of the function/to be denoted by V/ To empha- 
size the need for four indices to specify f A which stands for the value of V/ for a given A, we 
introduce the unusual four-bar notation as it appears in the following definition: 

h - [ (fA)krn PQ \ = [Dj km(A rs )] = [ (V/ ) kmpq (A r8 )] - Vf (A). (3.2) 

Clearly /a is not a square matrix in the usual sense, and, therelore, is not suitable for calculations 
in matrix notation. Following [1], we introduce the so-called contraction operation on/4 with respect 
to an arbitrary square matrix C whose order is the same as that of A: 



M£L m \$t (AW? 

Lp=i Q =i 



kmpq^pg 



(3.3) 



The new quantity, /i[C], to be known as the gradient of /with respect to A and contracted with 
C, requires only two indices for component representation. Hence the symbol fA[C] will replace 
/a wherever matrix operations are used. 

The definition of the gradient of / as given in (3.2) is equivalent to the following alternative 
definition based on the chain rule: 

fA[C\=j-f(A + sC). (3.4a) 



ds= ■ — 



or, in component notation, 



V V -A* 

2j 2j (fAHmpqCpq— j fkm.(Art + sC r t)' (3.4b) 

p=l q=l aS 

As a rule, both definitions given in (3.2) and (3.4) are useful for simple matrix-valued functions 
such as those listed below: 

1(A) = A; UW]=C. (3.5) 
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*(A)=£; L[C]=Ct. (36) 

£(A)=BA, B being a constant matrix; /.»[C] =BC. (3.7) 

/(/f) = /4B, ,8 being a constant matrix, /,[C] =CB. (3.8) 

For moderately complicated matrix-valued functions such asf(A) =A m ,m being a positive integer 
greater than 1, the matrix definition given by (3.4a) is far superior and sometimes becomes the 
sole means of evaluating the gradient of a matrix-valued function. The reader can easily verify, 
using (3.4a), the following useful result: (Note: A° = 1). 

l(A)=A m ,m= 2,3,4, - . - ; /a[C]= YAJCA""'-*. (3.9) 

? = 

4. Product Rule for Differentiation of Matrix-valued Functions 

Let/ be the product of two matrix-valued functions g and h with f = f (A) = g(A)h(A) = gh, 
where the order of the matrix multiplication is important. The product rule for partial differen- 
tiation yields the gradient of fin the following matrix notation: 

f A \C n \ = g A \Cl h_+g h A [C] for £=gh (4.1) 

Using the elementary formulas given in (3.5) and (3.6), we obtain immediately the following formula 
based on (4.1): 

f(A) =£A; f A \C\ = C]A + A r C. (4.2) 

To derive the formula for the gradient of the matrix inversion operator, we apply the product 
rule to the identity ^ _1 ^ = 1 : 

J(A)=A- 1 ; f A [C] A + A- l C = Q,i.e., f A [C] =-A- 1 CA~K (4.3) 

Using the product rule and (4.3), the reader can easily verify by induction: 

rn-\ 

f(A) = A- m , m --2X...; M-CJ = -;gA-'" + 'CA-' + '. » ( 4. 4) 

— i = 

Whenever the inverse of a matrix is mentioned, the restriction to the class of square matrices 
with nonzero determinants will be understood. 

5. Chain Rule for Differentiation of Scalar- and Matrix-Valued Functions 

Consider a scalar-valued function e of a matrix argument A whose components A^m are func- 
tions of a single scalar parameter t. The chain rule for differentiation with respect to t assumes 
the following form in component notation: 

n n 

j)(t) =e(A km (t)); r?(0 = X 2 (€A)kmA km (t) , (5.1a) 

fc=l m=l 
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where the dot symbol denotes the operator — . In matrix notation, (5.1a) becomes 

7,(t)=e(Mt)); 7,(0=tr(e/J (*)). 



(5.1b) 



In applications, it is common to work with a scalar- valued function </> of a matrix argument 
/which depends on another matrix argument A. The chain rule for differentiation with respect to 
A can be written in the following component notation: 



e(Art)^<t>(fuv(Aij)); (€,), 



,?, .?, (& 



) km if A ) kn 



(5.2a) 



To write (5.2a) in matrix notation, let us contract both sides of (5.2a) with an arbitrary matrix C; 

e(4)=4j/04)); tr(e^ _C) = tr(<g/^[C]). (g <2fa) 

Returning to Example 3 given in section 2, we are now equipped to evaluate the gradient of the 
function € defined by e = e(A) = det(A 2 -\- B) , B being a constant matrix. Using (2.9), (3.9) and the 
chain rule given by (5.2b), we have: 

lr{^C}=tr{det(A> + B)(^ + B)-HAC + CA)}. 
Since £ is arbitrary, we have ej= detQ4 2 + B) [{A 1 + B)- l A+A{A 2 + B)- 1 ] . 

6. A Collection of Some Elementary Formulas in "Matrix Calculus" 

Based on the product rule and the chain rule for differentiation in matrix notation as presented 
in the last two sections, a calculus of differentiable functions of square matrices, to be referred 



Formula 


Function 


Gradient 


Remark 


number 








1 


f(A)=A 


Ia[C]=C 




2 


f(A)=AT 


C T 




3 


f(A)=BA 


BC 


B being a constant matrix. 


4 


f(A)=AB 


CB 


B being a constant matrix. 


5 


f{A)=A" 


ra— 1 


m = 2,3,4, . . . 


6 


f(A)=ATA 


C T A+A T C 




7 


f(A)=A~i 


-A-iCA-* 




8 


f(A)=A-»> 


_-£ A - m+ i C A -i +i 


wi=2, 3, 4, . . . 


9 


e(A) = tiA 


€a= 1 


First invariant. 


10 


e(A)=detA 


det/f (A-*) 7 


nth invariant. 


11 


e(A)=<(>(A T ) 


(4>ArV 




12 


e(A) = 4>(BAD) 


B T (4> BAD ) D T 


B, D being constant 
matrices. 


13 


i(A) = j>(A-*) 


-(A-*)T(<i>A- i )(A-iy 




14 


e(A) = <l>(AA T ) 


2 (4 aa t) A 


Note {4>aat) t **(4>aat). 


15 


e(A) = <f>(A T A) 


2A(<t> ATA ) 


Note {4>A* A Y=(<\> A f A ). 


16 


e(A) = tr A~ l det A 


det A (^- 1 ) r (ltr^-» 


Second invariant for 






-(A-iy) 


n = 3. 
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to as the "matrix calculus," can be denned in an analogous way as the elementary theory of calculus 
based on the field of real or complex numbers. Obviously, for matrix calculus, the underlying 
mathematical object is not a field, but a noncommutative ring, i.e., the ring of square matrices of 
order n over the familiar ring of differentiable functions. An excellent account of the theory of 
matrices over rings was given recently by Newman [3], but here we merely present a collection of 
some elementary formulas in "matrix calculus" without studying its mathematical structure. 
For the convenience of the reader, the table on page 102 lists some of the most commonly used 
formulas in matrix calculus. 

Using the properties of the operator "trace" as listed in section 1, we observe that the deriva- 
tion for formulas Nos. 11-15 presents no difficulty. For example, formula No. 13 can be derived 
as follows: 
From (5.2b) and formula 7, we have 

tr (eJC) = tr (- (^^-.^-i) = tr (-^-i(^_,)7^-iC). (6.1) 

Since C is arbitrary, we obtain immediately the desired result. 

7. Applications 

To illustrate the ease with which certain problems in continuum physics can be treated by 
using some of the formulas listed in the last section, we shall present two examples in continuum 
mechanics: 
EXAMPLE 1: (All indices i,j, k, m,p, etc. range from 1 to 3.) 

Let the material coordinates of a particle in a continuous body be denoted by X 1 . Let the 
position coordinates of the same particle at time t be given by x k = x k (X l , t) . Two basic quantities 
can be defined: 

'<)x k 
F=F?(XK t)^— {XK t); (deformation gradient); (7.1) 

dx k 

v = v k (XK t) = —- (XK t); (velocity vector). (7.2) 

"". ot 

It is useful to express X ' as functions of x k and t so that the velocity components have the alternative 
representation v k =v k (x m , t). This allows us to define another useful quantity: 

L = L k n {xi>, - T^ (* p , t), (velocity gradient). (7.3) 

An important relation to be needed later follows immediately from the above definitions and the 
interchange of the order of partial differentiations: 

F = L T F, (7.4) 

where the dot symbol denotes the partial derivative with respect to t holding the material coordi- 
nates X 1 constant. 

The notion of "mass" of a continuous body leads to two notions of "mass density", namely, 
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the mass density p# with respect to a unit volume in the reference configuration where each particle 
is labeled with the material coordinates X\ and the mass density p t with respect to a unit volume 
in the spatial configuration at time t where each particle is observed to occupy the position at 
coordinates x k . The two mass densities are of course related: 

p R = p t detF. (7.5) 

The law of the conservation of mass states that pR = 0. Using (5.1b), formula No. 10, and the 
relation (7.4), we obtain the well known "equation of continuity" in classical mechanics: (Note: 

3 fiyk 



I 



p R = p t det F + p t (det F) 

= p t det £ + p t tr (det J? (F'^F) 
= p t det F + p t det F tr (F" 1 L T F) 

= pt det F + p t detFtr (L T ) 

= det F_{p t + pt div v) 

Since det F ^ 0, pn = implies pt + pt div v = 0. 

Example 2: 

One of the principles generally associated with the correct formulation of the constitutive 
equation of a material is known as the "Principle of Material Indifference" which means physically 
that the response of a material is independent of the observer. Let us confine our attention to 
"simple fluids" in the sense of Truesdell and Noll [1], where the most general constitutive equation 
may be written in the following form: 

T(t)=nC t (T);ptl (7.6) 

Here T(t) is the Cauchy stress at time t, C t (r) is the relative right Cauchy-Green tensor defined by 

^ dx^ir) dx p (r) 
(C t (r) )km — V k( x m( v P f 1S tne mass density at time t, and ^ is a functional of the history 
p=l "^ \t) ox \t) 

of Ct(r), — oo < T ^ t, with a parametric dependence on p t . The principle of material indifference 
requires that the functional ^ satisfies the following relation for an arbitary orthogonal matrix Q: 

nQCdr)Q^ pt] = <mCt(T);fk]Q[. (7.7) 

Consider now the following constitutive equation of a simple fluid: 4 

T(t)=-kpt)l+p t J t _ x C t (r) s^jdr, (7.8) 



4 The theory of such a fluid was proposed by Bernstein, Kearsley and Zapas [4]. Additional results on the same theory including the derivation of equation (7.8) 
appeared in a recent manuscript by Fong and Simmons [5]. 
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where p is a scalar-valued function of the mass density p t , and U is a scalar-valued function of one 
matrix argument C*(r) and one scalar variable t — r, i.e., U = £7(C*(t); t — r). Furthermore, 
U is required to satisfy the following condition for an arbitrary orthogonal Q: 



U(QC ( (t)Q t ; t-T)=U(C ( (r); t~r). 



(7.9) 



Differentiating eq (7.9) with respect to C t (r) and applying the formula No. 12 as listed in the 
last section, we obtain 



Q t Uc{QC i (t)Q'-, t-T)Q=Ur\C l (r);t-T), 



(7.10) 



where Uc denotes the partial gradient of U with respect to C ( (r). We are now ready to show that 
eq (7.8), indeed, satisfies the principle of material indifference. Using (7.6) and (7.8), we calculate 
the left-hand side of (7.7): 

£ [£C ( (r)^; p t ]=-p(pt)l+Ptj^ QQ(rW r 0^(QCJr)^'; t-r)dr. 

Substituting (7.10) into the above equation, we obtain 

f [QCt(T)Q T ; p t -\=-p(p t )\ + p t [ l QQ(r)Q' r QUj(C ( (T)yt-r)Qrdr. 

= Q&[Ct(r); p t ]Q T =R.H.S. of (7.7). Q.E.D. 



It is clear from the above two examples that the advantage of adopting the matrix notation 
and applying formulas in "matrix calculus" lies mainly in the elegance in which higher-dimensional 
problems in continuum physics can be formulated. It is also clear that even though our list of 
formulas was prepared for functions of one matrix argument, their applications can be easily ex- 
tended to functions of several matrix arguments. 



I wish to thank Dick Kraft, John A. Simmons, Seldon L. Stewart and Justin C. Walker for 
many helpful comments. 
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